Goto

Collaborating Authors

 Gangwon-do




Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise

Kang, Joonsung

arXiv.org Machine Learning

In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for \textbf{robust causal directionality inference} in quantum engineering, determining whether relations are system$\to$observation, observation$\to$system, or bidirectional. The method integrates CVAE-based latent constraints, MNAR-aware selection models, GEE-stabilized regression, penalized empirical likelihood, and Bayesian optimization. It jointly addresses quantum and classical noise while uncovering causal directionality, with theoretical guarantees for double robustness, perturbation stability, and oracle inequalities. Simulation and real-data analyses (TCGA gene expression, proteomics) show that the proposed MNAR-stabilized CVAE+GEE+AIPW+PEL framework achieves lower bias and variance, near-nominal coverage, and superior quantum-specific diagnostics. This establishes robust causal directionality inference as a key methodological advance for reliable quantum engineering.




Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Park, Shinwoo, Kim, Shubin, Kim, Do-Kyung, Han, Yo-Sub

arXiv.org Artificial Intelligence

The rapid advancement of large language models (LLMs) increases the difficulty of distinguishing between human-written and LLM-generated text. Detecting LLM-generated text is crucial for upholding academic integrity, preventing plagiarism, protecting copyrights, and ensuring ethical research practices. Most prior studies on detecting LLM-generated text focus primarily on English text. However, languages with distinct morphological and syntactic characteristics require specialized detection approaches. Their unique structures and usage patterns can hinder the direct application of methods primarily designed for English. Among such languages, we focus on Korean, which has relatively flexible spacing rules, a rich morphological system, and less frequent comma usage compared to English. We introduce KatFish, the first benchmark dataset for detecting LLM-generated Korean text. The dataset consists of text written by humans and generated by four LLMs across three genres. By examining spacing patterns, part-of-speech diversity, and comma usage, we illuminate the linguistic differences between human-written and LLM-generated Korean text. Building on these observations, we propose KatFishNet, a detection method specifically designed for the Korean language. KatFishNet achieves an average of 19.78% higher AUROC compared to the best-performing existing detection method. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis.


MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification

Na, Younghoon, Ahn, Hyun Keun, Lee, Hyun-Kyung, Lee, Yoongeol, Oh, Seung Hun, Kim, Hongkwon, Lee, Jeong-Gun

arXiv.org Artificial Intelligence

Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep stage classification Network). It aims to facilitate the effective collaboration between Convolutional Neural Networks (CNNs) and Transformer architectures for multi-modal training with the help of contrastive learning and cross-masking. Raw single channel EEG signals and corresponding spectrogram data provide differently characterized modalities for multi-modal learning. Our MC2SleepNet has achieved state-of-the-art performance with an accuracy of both 84.6% on the SleepEDF-78 and 88.6% accuracy on the Sleep Heart Health Study (SHHS). These results demonstrate the effective generalization of our proposed network across both small and large datasets.


A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions

Yin, Hao, Parmar, Paritosh, Xu, Daoliang, Zhang, Yang, Zheng, Tianyou, Fu, Weiwei

arXiv.org Artificial Intelligence

Action Quality Assessment (AQA) -- the ability to quantify the quality of human motion, actions, or skill levels and provide feedback -- has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development. As such, it has become a critical field in computer vision & video understanding over the past decade. Significant progress has been made in AQA methodologies, datasets, & applications, yet a pressing need remains for a comprehensive synthesis of this rapidly evolving field. In this paper, we present a thorough survey of the AQA landscape, systematically reviewing over 200 research papers using the preferred reporting items for systematic reviews & meta-analyses (PRISMA) framework. We begin by covering foundational concepts & definitions, then move to general frameworks & performance metrics, & finally discuss the latest advances in methodologies & datasets. This survey provides a detailed analysis of research trends, performance comparisons, challenges, & future directions. Through this work, we aim to offer a valuable resource for both newcomers & experienced researchers, promoting further exploration & progress in AQA. Data are available at https://haoyin116.github.io/Survey_of_AQA/


Transparent Networks for Multivariate Time Series

Kim, Minkyu, Lee, Suan, Kim, Jinho

arXiv.org Artificial Intelligence

Transparent models, which are machine learning models that produce inherently interpretable predictions, are receiving significant attention in high-stakes domains. However, despite much real-world data being collected as time series, there is a lack of studies on transparent time series models. To address this gap, we propose a novel transparent neural network model for time series called Generalized Additive Time Series Model (GATSM). GATSM consists of two parts: 1) independent feature networks to learn feature representations, and 2) a transparent temporal module to learn temporal patterns across different time steps using the feature representations. This structure allows GATSM to effectively capture temporal patterns and handle dynamic-length time series while preserving transparency. Empirical experiments show that GATSM significantly outperforms existing generalized additive models and achieves comparable performance to black-box time series models, such as recurrent neural networks and Transformer. In addition, we demonstrate that GATSM finds interesting patterns in time series.


KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context

Wang, Xiaonan, Yeo, Jinyoung, Lim, Joon-Ho, Kim, Hansaem

arXiv.org Artificial Intelligence

Large language models have exhibited significant enhancements in performance across various tasks. However, the complexity of their evaluation increases as these models generate more fluent and coherent content. Current multilingual benchmarks often use translated English versions, which may incorporate Western cultural biases that do not accurately assess other languages and cultures. To address this research gap, we introduce KULTURE Bench, an evaluation framework specifically designed for Korean culture that features datasets of cultural news, idioms, and poetry. It is designed to assess language models' cultural comprehension and reasoning capabilities at the word, sentence, and paragraph levels. Using the KULTURE Bench, we assessed the capabilities of models trained with different language corpora and analyzed the results comprehensively. The results show that there is still significant room for improvement in the models' understanding of texts related to the deeper aspects of Korean culture.